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1.  INTRODUCTION 


Divide-and-conquer  is  one  of  the  most  frequently  used  methods  for  the  design  of  fast 
algorithms.  The  most  common  application  of  the  technique  involves  breaking  a problem  of 
size  N into  two  subproblcms  of  size  N/2,  solving  these  subproblems,  then  doing  work 
proportional  to  N to  "marry"  the  partial  answers  into  a solution  for  the  entire  problem! 
this  scheme  leads  to  algorithms  of  CKN  log  N)  worst-case  time  complexity.  In  this  paper 
we  investigate  a similar  divide-and-conquer  technique  which  can  be  used  to  construct 
algorithms  with  linear  average-case  time  complexity. 

The  problem  of  determining  the  convex  hull  of  a set  of  points  in  two  and  three 
dimensions  has  produced  a rash  of  recent  papers  [4,  8,  15,  16],  all  containing  algorithms 
with  0<N  log  N)  worst -case  performance.  That  this  i6  optimal  follows  from  the  fact  that  in 
the  worst  case  all  N points  may  be  vertices  of  the  convex  hull,  and  since  the  vertices  of  a 
convex  polygon  occur  in  sorted  angular  order  about  each  interior  point,  any  convex  hull 
algorithm  must  be  able  to  sort  [14,  8].  If  the  boundary  of  the  convex  hull  contains  very 
few  points,  however,  this  lower  bound  does  not  apply,  and  a faster  algorithm  may  be 
possible.  The  algorithm  of  Jarvis  [5]  runs  in  time  0(hN),  where  h is  the  number  of  actual 
hull  vertices,  and  thus  takes  advantage  of  the  fact  that  h may  be  smalt.  Unfortunately,  if 
h is  not  known  in  advance,  the  algorithm  may  take  quadratic  time.  Eddy  [2]  has 
developed  a hull  algorithm  analogous  to  QUICKSORT  that  has  good  empirical  performance 
but  also  has  a quadratic  worst  case.  Our  goal  in  thiq  paper  is  to  use  information  about 
the  probability  distribution  of  h to  obtain  an  algorithm  with  0(N)  expected  running  time 
without  sacrificing  0(N  log  N)  worst-case  behavior. 

This  new  convex  hull  algorithm  leads  to  linear  expected-time  solutions  to  a host  of 
other  geometry  problems  that  are  related  to  hull-finding.  Among  these  are  determining 
the  greatest  distance  between  two  points  of  a set,  the  smallest  circle  enclosing  a set,  and  . ■ 

constructing  linear  pattern  classifiers.  Analogous  techniques  yield  a linear  average-case 

algorithm  for  linear  programming  in  two  variables. 
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The  divide -and-conquer  scheme  we  use  to  achieve  the  above  results  seems  to  be  a 
general  method  suitable  for  the  construction  of  fast  average-case  algorithms.  It  achieves 
fast  expected  time  at  the  cost  of  making  only  relatively  weak  assumptions  about  the 
underlying  probability  distribution  of  the  inputs.  Whereas  many  fast  average-case 
algorithms  display  poor  worst-case  behavior  (QUICKSORT,  for  examploi  sea  [13]),  those 
that  we  give  in  this  paper  have  optimal  worst-case  performance.  These  algorithms  are 
not  merely  of  asymptotic  interest  — they  are  faster  than  provious  methods  even  for 
very  small  problem  sizes  (N  > 40,  for  example). 

In  reading  this  paper,  one  must  be  very  careful  to  keep  in  mind  the  distinction 
between  worst-case  and  average-case  analyses.  For  example,  while  any  convex  hull 
algorithm  must  run  in  time  ft(N  Ig  N)  for  some  inputs^,  we  will  give  an  algorithm  with 
linear  expected  running  time  (for  some  distribution  of  inputs).  Notice  that  there  is  no 
contradiction  between  a worst-case  lower  bound  of  f)(N  ig  N)  and  an  average-case 
upper  bound  of  0(N). 

Basic  results  from  stochastic  geometry  are  described  in  Section  2;  these  results  form 
the  basis  of  our  probabilistic  analysis  of  the  algorithms  presented.  In  Section  3 we  give 
a fast  expected-time  algorithm  for  finding  convex  hulls  in  the  plane  and  investigate  in 
detail  the  schema  used  in  the  algorithm.  Section  4 shows  how  this  method  can  be  applied 
to  other  problems  and  used  as  a building  block  for  developing  additional  fast  expected- 
time algorithms.  Section  5 contains  suggestions  for  further  work  along  these  lines. 


2.  RESULTS  FROM  STOCHASTIC  GEOMETRY 


Stochastic  geometry  deals  with  the  properties  of  random  sets  of  points,  lines  and  other 
geometric  objects  and  is  an  essential  tool  for  analyzing  the  average  case  of  geometric 
algorithms.  Many  phenomena  in  geometrical  probability  are  counter-intuitive  and  difficult 
to  explain  without  the  tools  of  probabilistic  measure  theory.  For  example,  the  statement, 
"Choose  N points  at  random  in  the  plane”,  is  meaningless  without  a precise  specification 
of  distribution  from  which  the  points  aro  to  be  chosen.  Furthermore,  not  all  conceivable 
distributions  satisfy  the  axioms  of  probability.  Points  can  be  chosen  uniformly  in  the 
plane  only  from  a set  of  bounded  Lebesgue  measure  [6],  so  the  intuitively  attractive 
notion  of  a uniform  random  selection  from  the  whole  plane  must  be  discarded. 

The  problem  of  determining  h(N),  the  expected  number  of  vertices  of  the  convex  hull 
of  N points,  has  received  a good  deal  of  attention  [I,  3,  9,  II];  a summary  of  this  work 
may  bo  found  in  [12]  We  now  quote  several  results  that  will  be  used  later  in  analyzing 
our  algorithms: 

Theorem  1.  (R6nyi  and  Sulanke  [11])  If  N points  are  chosen  uniformly 
and  independently  at  random  in  the  plane  from  a convex  r-gon,  then 
as  N -»  on, 


2W.  aay  that  f(N)  • ftf(N))  if  f(N)/|(N)  ia  bountM  balow  by  aoiaa  nontrro  conalaM  C aa  N 4 a. 
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( 1 - Euler’s  constant) 


h(N)  - (2r/3)  ( t ♦ logeN)  ♦ CKI>  . 

Theorem  2.  (Raynaud  [9])  If  N points  are  chosen  uniformly  and 
independently  at  random  from  the  interior  of  a k-dimensional 
hypersphere,  then  as  N -»  oo,  f(N),  the  expected  number  of  faces  of 
the  convex  hull,  is  given  asymptotically  by 

f(N)  - (W/fc-DAk+i))  . 

Since  with  probability  one  each  face  of  the  hull  is  simplicial  and  thus 
is  determined  by  k vertices,  Theorem  2 implies  that 

h(N)  ■ 0(Nj^)  , for  points  chosen  uniformly  in  a circle,  end 
h(N)  - 0(l\r'2)  , for  points  chosen  uniformly  in  a sphere. 

Thus  in  any  dimension,  for  points  in  a hypersphere,  the  expected  number 
of  hull  vertices  is  bounded  above  by  NP,  for  some  p<l. 

Theorem  3.  (Raynaud  [9])  If  N points  are  chosen  independently 
from  a normal  distribution  in  k dimensions,  then  as  N -*  oo  the 
asymptotic  behavior  of  h(N)  is  given  by 

h(N)  - 0(  (log  N)(k-l)/2  ) . 


A useful  connection  can  be  established  between  the  stochastic  properties  of  convex 
hulls  and  the  expected  number  of  maximal  vectors  in  a random  6et.  A maximal  vector  is 
one  that  is  not  less  than  any  other  in  all  components.  Under  very  general  conditions  the 
expected  number  of  maximal  vectors  in  a sat  is  quite  small: 

Theorem  4.  (Kung,  Schkolnick,  Thompson  [7])  If  N k-dimensional  vectors 
are  chosen  such  that  their  components  are  distributed  independently, 
then  A(N,k),  the  expected  number  of  maximal  vectors,  is  bounded  by 

A(N,k)  s (logeN)K_1  for  N > 3 . 

Note  that  a vertex  of  the  convex  hull  of  a finite  k-dimensional  set  is  maximal  for  some 
assignment  of  plus  and  minus  signs  to  all  coordinates  of  its  points.  This  implies  that  for 
distributions  satisfying  the  independence  assumption  of  Theorem  4,  the  expected  number 
of  vertices  of  the  convex  hull  is  bounded  by 

E(h)  s 2k(logeN)k-1  - 2 (2  logeN)k-l  . 

The  multivariate  normal  of  Theorem  3,  the  multivariate  exponential,  and  the  uniform 
distribution  over  a hypercube  all  satisfy  the  independence-of-components  assumption. 
The  qualitative  behavior  of  the  hulls  of  random  sets  may  be  understood  intuitively  as 
follows:  for  uniform  sampling  within  any  bounded  figure  F,  the  hull  of  a random  set  will 
tend  to  assume  the  shape  of  the  boundary  of  F.  If  F is  a polygon,  points  accumulating  in 
the  "corners"  will  cause  the  resulting  hull  to  have  very  few  vertices.  Because  the  circle 
has  no  corners,  the  expected  number  of  hull  vertices  is  comparatively  high.  It  is 
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reasonable  that  only  some  small  fraction  of  the  sample  points  should  survive  as  hull 
vertices,  but  in  all  of  the  above  theorems  the  order  of  h(N)  is  sublirtear.  Informally  we 
may  account  for  this  by  noting  that  the  hull  is  a manifold  of  strictly  lower  dimension  than 
the  set  from  which  the  points  aro  being  chosen.  If  this  is  not  true,  we  may  hove  h(N)  - 
CXN).  For  example,  if  N points  are  selected  uniformly  on  the  boundary  of  a circle,  then 
h(N)  - N.  As  we  shall  see  in  the  next  section,  the  only  assumption  about  the  distribution 
of  points  that  needs  to  bo  made  in  order  to  obtain  a linear  expected-time  algorithm  is 
that  h(N)  ■ (XNP),  for  some  p<l. 


3.  CONVEX  HULLS  IN  THE  PLANE 


The  fast  convex  hull  algorithm  is  easily  described  as  a recursive  procedure:  If  N,  the 
number  of  given  points,  is  less  than  some  constant  C,  then  the  procedure  calculates  the 
hull  by  some  straightforward  method  and  returns.  If  N is  large,  though,  the  procedure 
first  divides  the  N points  into  two  subsets  of  approximately  M/2  points  each  by  a method 
which  ensures  that  the  resulting  subproblems  are  random.  It  then  finds  the  convex  hulls 
of  tho  random  subproblems  recursively,  which  will  take  expected  time  2T(N/2),  since  the 
subproblems  are  of  the  same  form  as  the  original.  The  result  of  each  of  the  recursive 
calls  is  a convex  polygon  whose  expected  number  of  vertices  is  (XNP),  with  p<l.  The  hull 
of  tho  given  set  is  now  just  the  hull  of  the  union  of  the  hulls  found  in  the  subproblems. 
Shamos  [15]  has  given  an  algorithm  to  find  the  hull  of  the  union  of  two  convex  polygons 
in  time  proportional  to  tho  total  number  of  vertices  of  both.  We  may  use  this  algorithm 
to  merge  the  results  of  the  subproblems  in  expected  time  (XN?).  The  average  running 
time  of  this  algorithm  thus  obeys  the  recurrence 

(1)  T(N)  - 2T(N/2)  ♦ CXNP)  , 

whose  solution,  for  p<I,  is  T(N)  ■ 0(N).  Thus  we  have  shown  that  the  algorithm  runs  in 
linear  expected  time  for  point  sets  satisfying  the  assumptions  made  in  Section  2. 
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We  assumed  above  two  important  properties  about  the  division  step  of  the  divide -and- 
conquer  algorithm:  that  it  can  be  accomplished  in  constant  time  and  that  the  points  in  the 
subprobleme  obey  the  same  probability  distribution  as  do  the  original  points.  A division 
step  with  these  properties  can  easily  be  implemented  on  a RAM  by  storing  the  points  in  a 
two  by  N array  of  cartesian  coordinates.  Each  point  is  initially  assigned  a random 
location  in  the  array  and  a subset  of  the  points  is  represented  as  a pair  of  integers 
which  define  (he  left  and  right  endpoints  of  a segment  of  the  array.  Division  into  further 
subsets  can  be  accomplished  by  taking  the  arithmetic  mean  of  the  endpoints  as  defining 
two  new  segments,  etc.;  note  that  the  division  preserves  randomness.  In  implementing 
this  algorithm  recursively,  it  is  crucial  to  pass  only  pointers  in  the  subroutine  calls.  If 
entire  subproblems  are  passed,  equation  (1)  no  longer  applies  and  an  N log  N algorithm 
will  result,  no  matter  how  few  points  are  on  the  convex  hull. 

Let  us  now  note  the  features  of  the  above  algorithm  that  give  it  linear  expected  time. 
First,  the  expected  size  of  the  output  is  small.  Second,  solutions  to  the  random 
subproblems  can  be  married  quickly  to  form  a solution  to  the  total  problem.  Note  that 
the  algorithm  also  has  optimal  worst-case  performance.  Since  the  largest  hull  that  can  be 
returned  by  a subproblcm  is  of  size  N,  we  always  have 
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(2) 


T(N)  i 2T(N/2)  + CXN)  , 


whose  solution  is  T(N)  S (XN  log  N).  We  can  use  this  algorithm  as  a paradigm  by  which  to' 
create  others  with  linear  expected  time  and  optimal  worst-case  behavior. 


4.  FURTHER  EXAMPLES 


The  first  simple  extension  of  the  algorithm  of  Section  3 gives  a linear  expected-time 
algorithm  for  the  convex  hull  of  a set  of  points  in  three  dimensions.  Preparata  and  Hong, 
in  an  important  recent  paper  [8],  have  shown  that  the  hull  of  the  union  of  two  disjoint 
convex  three-dimensional  polyhodra  can  be  found  in  time  that  in  the  worst  case  is  only 
linear  in  the  total  number  of  vertices.  Their  algorithm  makes  no  essential  use  of  the  fact 
that  the  polyhodra  arc  disjoint  and  can  be  readily  modified  to  include  the  case  in  which 
the  intersection  is  nonempty.  If  the  points  are  drawn  from  a distribution  satisfying  the 
assumptions  of  Section  2,  then  the  recurrence  relation  (1)  applies  and  we  again  have  a 
linear  expected-time  algorithm. 

Many  geometric  algorithms  are  based  on  finding  convex  hulls.  For  example,  the 
diameter  of  a set  (distance  between  its  two  farthest  points)  is  always  realized  by  two 
vertices  of  the  hull.  Furthermore,  these  points  con  be  found  in  linear  time  (in  two 
dimensions)  once  the  convex  hull  is  available  [14].  We  thus  immediately  have  a linear 
expected-time  diameter  algorithm.  Somewhat  more  complicated  is  the  problem  of 
determining  the  smallest  circle  enclosing  a plane  set  of  points.  This  is  a classical  problem 
with  an  extensive  literature.  An  0(N  log  N)  worst-case  algorithm  is  given  in  [16].  It  is 
elementary  to  show  that  the  two  or  three  points  determining  this  circle  are  vertices  of 
the  convex  hull.  If  we  first  find  the  hull  with  a linear  expected-time  algorithm,  the  time 
required  for  the  remaining  step  (finding  the  circle)  is  not  linear  in  the  number  of  hull 
vertices.  If  E(h)  is  the  expected  number  of  hull  vertices,  we  need  to  know  E(h  log  h)  to 
complete  the  analysis.  Note  that  we  always  have  1 S h SN  and 

N 

E(h)  - I iPj 
i-1 

where  p|  is  the  probability  that  h»i.  Now,  since  log  i s log  N, 

N 

E(h  log  h)  « 2 (i  log  i)  Pj 
i-1 

i (log  N)  E(h)  . 

Thus,  if  E(h)  - 0(NP),  p<l,  then  E(h  log  h)  - OfN*1),  for  some  q<l.  We  may  therefore  find 
the  smallest  circle  enclosing  a plane  set  in  linear  average  time. 

In  general,  determining  expectation  values  of  functions  of  h is  a difficult  problem  and 
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we  often  must  be  satisfied  with  upper  bounds.  The  largest  area  triangle  determined  by 
three  points  of  a set  of  N points  in  the  plane  can  be  found  in  time  that  is  quadratic  in  the 
numbor  of  vertices  of  the  hull  [15}  In  order  to  be  able  to  calculate  the  average-case 
behavior  of  the  algorithm,  we  must  compute  E(h^).  If  E(h)  - OfNf*),  then  certainly 
E(h‘)  | OfNP*1).  Applying  Theorem  2,  we  may  find  the  largest  determined  triangle  in 
CXNa'3)  expected  time  in  two  dimensions,  and  this  bound  is  highly  pessimistic. 

Theorem  4 leads  directly  to  a linear  expected-time  algorithm  for  finding  the  maxima  of 
N k-dimensional  vectors  whose  coordinates  are  chosen  independently.  It  is  only 
necessary  to  remark  that  the  marriage  step  of  the  divide -and -conquer  algorithm  finds  the 
common  maxima  of  two  subproblems  of  size  N/2,  each  of  which  has  very  few  maxima,  on 
the  average. 

We  often  observe  that  the  performance  of  an  algorithm  is  much  better  than  its  worst- 
case  lower  bounds  would  lead  us  to  expect;  the  Simplex  algorithm  for  linear  programming 
is  a striking  example  of  this  phenomenon.  Fast  as  Simplex  is,  however,  it  is  known  not  to 
be  optimal  for  problems  with  small  numbers  of  variables,  and  a divide -and-conquer 
approach  can  be  used  to  advantage  [17]:  the  feasible  region  of  a two-variable  problem  i6 
the  intersection  of  the  half-planes  determined  by  the  linear  constraints.  If  we  denote  the 
i'"  half-plane  by  Hjf  then  we  want  to  form 

H|  ft  H2  n ...  n Hn  . 

Since  the  intersection  operator  is  associative,  this  may  be  rearranged  as 

( Hj  n ...  n ) 0 ( Hj^2*1  n • • • h Hjg  ) . 

Each  term  is  an  intersection  of  N/2  half -planes,  and  is  thus  a convex  polygonal  region  of 
at  most  N/2  vertices.  The  intersection  of  two  such  figures  can  be  found  in  linear  time  at 
worst  [14],  so  equation  (2)  describes  tho  worst-case  behavior  of  tho  algorithm.  We  may 
thus  find  the  intersection  of  N half-planes  in  0(N  log  N)  time.  If  many  of  the  half-planes 
are  redundant,  though,  the  final  intersection  will  have  very  few  vertices,  and  we  may 
take  advantage  of  this  fact  to  develop  a better  algorithm.  Suppose  that  Kq  is  a bounded 
convex  region  of  the  plane  that  contains  another  convex  region  Kj.  If  N lines  Lj  are 
drawn  independently  and  at  random  to  meet  Kq  but  not  Kj,  and  we  define  to  be  the 
closed  half-plane  bounded  by  Lj  that  contains  Kj,  consider  E(v),  the  expected  numbor  of 
vertices  of  the  intersection  of  all  the  Hj.  Preliminary  results  were  obtained  by  TOnyi  and 
Sulanke  [10]  and  Ziczold  [18]  has  shown  by  duality  that  E(v)  is  of  the  same  asymptotic 
order  as  the  expected  numbor  of  points  on  the  hull  of  a set  of  N points  drawn  uniformly 
within  Kj.  If  Kj  shrinks  to  a point,  then  E(v)  approaches  the  constant  rt^/2.  In  any 
event,  under  fairly  conservative  assumptions  we  will  have  E(v)  - OfN**),  p<l,  and  a linear 
average-case  algorithm  for  intersecting  N half-planes  results.  This  leads  immediately  to 
an  (XN)  expected-time  algorithm  for  linear  programming  in  two  variables  and  for  finding 
the  kernel  of  a polygon  [14} 
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It  is  natural  to  try  lo  extend  the  results  of  this  paper  to  higher -dimensional  problems 
in  geometry  and  to  other  problem  domains.  The  limiting  factor,  however,  is  not  the 
technique  but  our  inadequate  Knowledge  of  the  properties  of  random  sets  and  our 
inability  to  develop  efficient  merge  procedures  to  make  divide-and -conquer  productive. 
As  an  example,  no  method  is  now  Known  to  find  the  hull  of  the  union  of  four-dimensional 
polyhedra  in  less  than  quadratic  expected  time.  (Quadratic  time  is  required  in  the  worst 
case.  See  [8].)  Likewise,  the  expected  value  of  the  square  of  the  number  of  vertices  of 
the  hull  of  N points  chosen  uniformly  in  a four-dimensional  hypersphere  in  not  known  to 
be  less  than  OtN°'^).  We  are  thus  unable  to  give  an  algorithm  for  the  four-dimensional 
convex  hull  whose  expected  running  time  is  provably  less  than  0(N®^®). 
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